Similarity Constraints in Beam-search Building of Predictive Clustering Trees

نویسندگان

  • Dragi Kocev
  • Jan Struyf
چکیده

We investigate how inductive databases (IDBs) can support global models, such as decision trees. We focus on predictive clustering trees (PCTs). PCTs generalize decision trees and can be used for prediction and clustering, two of the most common data mining tasks. Regular PCT induction builds PCTs top-down, using a greedy algorithm, similar to that of C4.5. We propose a new induction algorithm for PCTs based on beam-search. This has three advantages over the regular method: (1) it returns a set of PCTs satisfying the user constraints instead of just one PCT; (2) it better allows for pushing of user constraints into the induction algorithm; and (3) it is less susceptible to myopia. In addition, we propose similarity constraints for PCTs, which improve the diversity of the resulting PCT set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Beam Search Induction and Similarity Constraints for Predictive Clustering Trees

Much research on inductive databases (IDBs) focuses on local models, such as item sets and association rules. In this work, we investigate how IDBs can support global models, such as decision trees. Our focus is on predictive clustering trees (PCTs). PCTs generalize decision trees and can be used for prediction and clustering, two of the most common data mining tasks. Regular PCT induction buil...

متن کامل

Clustering Trees with Instance Level Constraints

Constrained clustering investigates how to incorporate domain knowledge in the clustering process. The domain knowledge takes the form of constraints that must hold on the set of clusters. We consider instance level constraints, such as must-link and cannot-link. This type of constraints has been successfully used in popular clustering algorithms, such as k-means and hierarchical agglomerative ...

متن کامل

An improved opposition-based Crow Search Algorithm for Data Clustering

Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...

متن کامل

A novel local search method for microaggregation

In this paper, we propose an effective microaggregation algorithm to produce a more useful protected data for publishing. Microaggregation is mapped to a clustering problem with known minimum and maximum group size constraints. In this scheme, the goal is to cluster n records into groups of at least k and at most 2k_1 records, such that the sum of the within-group squ...

متن کامل

(Inductive) Querying Environment for Predictive Clustering Trees

Inductive databases tightly integrate databases with data mining. Besides data, an inductive database also stores models that have been obtained by running data mining algorithms on the data. By means of a querying environment, the user can query the database and retrieve particular models. In this paper, we propose such a querying environment. It can be used for building new models and for sea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006